{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_1_timeseries.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 10: Time Series in Keras**  \n",
    "**Part 10.1: Time Series Data Encoding for Deep Learning**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 10 Material\n",
    "\n",
    "* **Part 10.1: Time Series Data Encoding for Deep Learning** [[Video]](https://www.youtube.com/watch?v=dMUmHsktl04&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_1_timeseries.ipynb)\n",
    "* Part 10.2: Programming LSTM with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=wY0dyFgNCgY&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_2_lstm.ipynb)\n",
    "* Part 10.3: Text Generation with Keras and TensorFlow [[Video]](https://www.youtube.com/watch?v=6ORnRAz3gnA&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_3_text_generation.ipynb)\n",
    "* Part 10.4: Introduction to Transformers [[Video]](https://www.youtube.com/watch?v=Z7FIdKVQ7kc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_4_intro_transformers.ipynb)\n",
    "* Part 10.5: Transformers for Timeseries [[Video]](https://www.youtube.com/watch?v=SX67Mni0Or4&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_10_5_keras_transformers.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 10.1: Time Series Data Encoding\n",
    "\n",
    "There are many different methods to encode data over time to a neural network. In this chapter, we will examine time series encoding and recurrent networks, two topics that are logical to put together because they are both methods for dealing with data that spans over time. Time series encoding deals with representing events that occur over time to a neural network. This encoding is necessary because a feedforward neural network will always produce the same output vector for a given input vector. Recurrent neural networks do not require encoding time series data because they can automatically handle data that occur over time. \n",
    "\n",
    "The variation in temperature during the week is an example of time-series data. For instance, if we know that today’s temperature is 25 degrees Fahrenheit and tomorrow’s temperature is 27 degrees, the recurrent neural networks and time series encoding provide another option to predict the correct temperature for the week. Conversely, a traditional feedforward neural network will always respond with the same output for a given input. If we train a feedforward neural network to predict tomorrow’s temperature, it should return a value of 27 for 25. It will always output 27 when given 25 might hinder its predictions. Surely the temperature of 27 will not always follow 25. It would be better for the neural network to consider the temperatures for days before the prediction. Perhaps the temperature over the last week might allow us to predict tomorrow’s temperature. Therefore, recurrent neural networks and time series encoding represent two different approaches to representing data over time to a neural network.   \n",
    "\n",
    "Previously we trained neural networks with input ($x$) and expected output ($y$). $X$ was a matrix, the rows were training examples, and the columns were values to be predicted. The $x$ value will now contain sequences of data. The definition of the $y$ value will stay the same.\n",
    "\n",
    "Dimensions of the training set ($x$):\n",
    "* Axis 1: Training set elements (sequences) (must be of the same size as $y$ size)\n",
    "* Axis 2: Members of sequence\n",
    "* Axis 3: Features in data (like input neurons)\n",
    "\n",
    "Previously, we might take as input a single stock price to predict if we should buy (1), sell (-1), or hold (0). The following code illustrates this encoding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[32], [41], [39], [20], [15]]\n",
      "[1, -1, 0, -1, 1]\n"
     ]
    }
   ],
   "source": [
    "x = [\n",
    "    [32],\n",
    "    [41],\n",
    "    [39],\n",
    "    [20],\n",
    "    [15]\n",
    "]\n",
    "\n",
    "y = [\n",
    "    1,\n",
    "    -1,\n",
    "    0,\n",
    "    -1,\n",
    "    1\n",
    "]\n",
    "\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code builds a CSV file from scratch. To see it as a data frame, use the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[32 41 39 20 15]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>41</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>39</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>15</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    x  y\n",
       "0  32  1\n",
       "1  41 -1\n",
       "2  39  0\n",
       "3  20 -1\n",
       "4  15  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display, HTML\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "x = np.array(x)\n",
    "print(x[:,0])\n",
    "\n",
    "\n",
    "df = pd.DataFrame({'x':x[:,0], 'y':y})\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might want to put volume in with the stock price.  The following code shows how to add a dimension to handle the volume."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[32, 1383], [41, 2928], [39, 8823], [20, 1252], [15, 1532]]\n",
      "[1, -1, 0, -1, 1]\n"
     ]
    }
   ],
   "source": [
    "x = [\n",
    "    [32,1383],\n",
    "    [41,2928],\n",
    "    [39,8823],\n",
    "    [20,1252],\n",
    "    [15,1532]\n",
    "]\n",
    "\n",
    "y = [\n",
    "    1,\n",
    "    -1,\n",
    "    0,\n",
    "    -1,\n",
    "    1\n",
    "]\n",
    "\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, very similar to what we did before.  The following shows this as a data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[32 41 39 20 15]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "      <th>volume</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>32</td>\n",
       "      <td>1383</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>41</td>\n",
       "      <td>2928</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>39</td>\n",
       "      <td>8823</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>20</td>\n",
       "      <td>1252</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>15</td>\n",
       "      <td>1532</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   price  volume  y\n",
       "0     32    1383  1\n",
       "1     41    2928 -1\n",
       "2     39    8823  0\n",
       "3     20    1252 -1\n",
       "4     15    1532  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display, HTML\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "x = np.array(x)\n",
    "print(x[:,0])\n",
    "\n",
    "\n",
    "df = pd.DataFrame({'price':x[:,0], 'volume':x[:,1], 'y':y})\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we get to sequence format. We want to predict something over a sequence, so the data format needs to add a dimension. You must specify a maximum sequence length. The individual sequences can be of any size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[32, 1383], [41, 2928], [39, 8823], [20, 1252], [15, 1532]], [[35, 8272], [32, 1383], [41, 2928], [39, 8823], [20, 1252]], [[37, 2738], [35, 8272], [32, 1383], [41, 2928], [39, 8823]], [[34, 2845], [37, 2738], [35, 8272], [32, 1383], [41, 2928]], [[32, 2345], [34, 2845], [37, 2738], [35, 8272], [32, 1383]]]\n",
      "[1, -1, 0, -1, 1]\n"
     ]
    }
   ],
   "source": [
    "x = [\n",
    "    [[32,1383],[41,2928],[39,8823],[20,1252],[15,1532]],\n",
    "    [[35,8272],[32,1383],[41,2928],[39,8823],[20,1252]],\n",
    "    [[37,2738],[35,8272],[32,1383],[41,2928],[39,8823]],\n",
    "    [[34,2845],[37,2738],[35,8272],[32,1383],[41,2928]],\n",
    "    [[32,2345],[34,2845],[37,2738],[35,8272],[32,1383]],\n",
    "]\n",
    "\n",
    "y = [\n",
    "    1,\n",
    "    -1,\n",
    "    0,\n",
    "    -1,\n",
    "    1\n",
    "]\n",
    "\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even if there is only one feature (price), you must use 3 dimensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[32], [41], [39], [20], [15]], [[35], [32], [41], [39], [20]], [[37], [35], [32], [41], [39]], [[34], [37], [35], [32], [41]], [[32], [34], [37], [35], [32]]]\n",
      "[1, -1, 0, -1, 1]\n"
     ]
    }
   ],
   "source": [
    "x = [\n",
    "    [[32],[41],[39],[20],[15]],\n",
    "    [[35],[32],[41],[39],[20]],\n",
    "    [[37],[35],[32],[41],[39]],\n",
    "    [[34],[37],[35],[32],[41]],\n",
    "    [[32],[34],[37],[35],[32]],\n",
    "]\n",
    "\n",
    "y = [\n",
    "    1,\n",
    "    -1,\n",
    "    0,\n",
    "    -1,\n",
    "    1\n",
    "]\n",
    "\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 10 Assignment\n",
    "\n",
    "You can find the first assignment here: [assignment 10](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class10.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
